skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Denslow, Michael"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract PremiseOne of the slowest steps in digitizing natural history collections is converting labels associated with specimens into a digital data record usable for collections management and research. Here, we address how herbarium specimen labels can be converted into digital data records via extraction into standardized Darwin Core fields. MethodsWe first showcase the development of a rule‐based approach and compare outcomes with a large language model–based approach, in particular ChatGPT4. We next quantified omission and commission error rates across target fields for a set of labels transcribed using optical character recognition (OCR) for both approaches. For example, we find that ChatGPT4 often creates field names that are not Darwin Core compliant while rule‐based approaches often have high commission error rates. ResultsOur results suggest that these approaches each have different strengths and limitations. We therefore developed an ensemble approach that leverages the strengths of each individual method and documented that ensembling strongly reduced overall information extraction errors. DiscussionThis work shows that an ensemble approach has particular value for creating high‐quality digital data records, even for complicated label content. While human validation is still needed to ensure the best possible quality, automated approaches can speed digitization of herbarium specimen labels and are likely to be broadly usable for all natural history collection types. 
    more » « less
    Free, publicly-accessible full text available November 5, 2025
  2. Abstract PremiseAmong the slowest steps in the digitization of natural history collections is converting imaged labels into digital text. We present here a working solution to overcome this long‐recognized efficiency bottleneck that leverages synergies between community science efforts and machine learning approaches. MethodsWe present two new semi‐automated services. The first detects and classifies typewritten, handwritten, or mixed labels from herbarium sheets. The second uses a workflow tuned for specimen labels to label text using optical character recognition (OCR). The label finder and classifier was built via humans‐in‐the‐loop processes that utilize the community science Notes from Nature platform to develop training and validation data sets to feed into a machine learning pipeline. ResultsOur results showcase a >93% success rate for finding and classifying main labels. The OCR pipeline optimizes pre‐processing, multiple OCR engines, and post‐processing steps, including an alignment approach borrowed from molecular systematics. This pipeline yields >4‐fold reductions in errors compared to off‐the‐shelf open‐source solutions. The OCR workflow also allows human validation using a custom Notes from Nature tool. DiscussionOur work showcases a usable set of tools for herbarium digitization including a custom‐built web application that is freely accessible. Further work to better integrate these services into existing toolkits can support broad community use. 
    more » « less
  3. A comprehensive overview of volunteer-driven public programs focused on activities to enhance natural history collections (NHCs) is provided. The initiative revolves around the WeDigBio events and the Collections Club at the Field Museum, aiming to deepen the public’s connection with scientific collections, enhance participatory science, and improve data associated with natural history specimens. The implementation and journey of these programs are outlined, including surveys conducted from 2015 through 2021 to gauge participant motivation, satisfaction, and the impact of these events on public engagement with NHCs. Results show trends in on-site and virtual volunteer participation over the years, especially during the peak period of the COVID-19 pandemic. The majority of participants expressed high satisfaction, indicating a willingness to continue participating in similar activities. The surveys revealed a shift towards more altruistic motivations for participation over time, with increased emphasis on supporting the Field Museum and contributing to the scientific community. The success of participatory science events demonstrates the potential of volunteer-driven programs to contribute meaningfully to the preservation, digitisation, and understanding of biodiversity collections, ultimately transforming spectators into stewards of natural history. From 2015 to present participants celebrate a significant milestone, with over a thousand community scientists contributing to the inventorying, collection care, curation, databasing, or transcription of 286,071 specimens, objects or records. We also discuss accuracy and quality control as well as a checklist and recommendations for similar activities. 
    more » « less
    Free, publicly-accessible full text available December 18, 2025